Register pprof HTTP handlers on an internal port, use go tool pprof to collect CPU, heap, and goroutine profiles, and visualize with flame graphs to find performance bottlenecks.
Reproduce the issue under load, then collect a 30-60 second CPU profile
Flame graph: wide boxes are hot code — focus on the widest frames
Heap in-use profile shows live allocations — heap allocs profile shows cumulative allocations
Compare before/after with pprof diff: pprof -base before.pb.gz after.pb.gz
Security: never expose pprof endpoints on public-facing ports — use internal port or VPN